Picture for Alexander H. Liu

Alexander H. Liu

Ministral 3

Add code
Jan 13, 2026
Viaarxiv icon

Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal

Add code
Dec 14, 2025
Viaarxiv icon

Voxtral

Add code
Jul 17, 2025
Viaarxiv icon

USAD: Universal Speech and Audio Representation via Distillation

Add code
Jun 23, 2025
Figure 1 for USAD: Universal Speech and Audio Representation via Distillation
Figure 2 for USAD: Universal Speech and Audio Representation via Distillation
Figure 3 for USAD: Universal Speech and Audio Representation via Distillation
Figure 4 for USAD: Universal Speech and Audio Representation via Distillation
Viaarxiv icon

Magistral

Add code
Jun 12, 2025
Figure 1 for Magistral
Figure 2 for Magistral
Figure 3 for Magistral
Figure 4 for Magistral
Viaarxiv icon

Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

Add code
Mar 06, 2025
Figure 1 for Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Figure 2 for Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Figure 3 for Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Figure 4 for Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
Viaarxiv icon

SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction

Add code
Nov 25, 2024
Figure 1 for SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Figure 2 for SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Figure 3 for SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Figure 4 for SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Viaarxiv icon

A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation

Add code
Oct 29, 2024
Figure 1 for A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Figure 2 for A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Figure 3 for A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Viaarxiv icon

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Add code
Sep 25, 2024
Figure 1 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Figure 2 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Figure 3 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Figure 4 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Viaarxiv icon

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

Add code
Sep 24, 2024
Viaarxiv icon